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COUNTER BASED STRIDE PREDICTION FOR DATA PREFETCH 

Technical Field 

This invention relates to the field of electronics, and specifically to a method and 
* system for predicting a next data location in a process to facilitate a prefetch of data from 
that location. 

Background Art 

Prefetching is a common technique for minimizing latency in a sequential process. 
Data that is expected to be needed at a future time in the process is retrieved from a 
memory and stored in a cache, for subsequent access at the future time. The cache is 
designed to provide a substantially faster access time than the memory. Thus, when the 
process needs the data, and the data is in the cache, the data is provided to the process at 
the higher cache-access speed. Conversely, if the data is not in the cache, the data is not 
provided to the process until after the substantially slower memory-access time, thereby 
introducing memory-access delays into the process. 

A variety of techniques are commonly available to facilitate the prediction of the 
data that is going to be needed at a future time. One such technique is "stride prediction", 
wherein the location of the next data item to be needed is based on the sequence of 
locations of the prior-accessed data items. For example, data is often stored as an array or 
list of data records, such as records of employee's names, addresses, social security 
number, and so on. Typically, these records are fixed-length records, such that the 
beginning of each record is separated by a fixed number of memory locations from its 
adjacent records in the memory. An application that provides a printout or display of all 
employee names, for example, will sequentially access memory locations that are separated 
by this fixed number. Given that the application accesses a first employee name from the 
memory at location L, and accesses a second employee name at location L+S, it is likely 
that the application will access data located at location L+S+S of the memory to obtain the 
next employee name. If the data at location L+S+S is retrieved from memory and stored in 
a higher-speed cache memory before this access is requested, the third employee name can 
be provided to the application more quickly than the first or second employee names. 

FIG. 1 illustrates an example flow diagram of a prior art stride prediction prefetch 
process. At 1 10, the prefetch process is invoked, typically at the same time that an 
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apphcation invokes a request for access to a new data item. Not illustrated, this prefetch 
process may be called as part of a global scheme that includes multiple prefetching 
algonthms, and may be selectively invoked depending upon factors sucbas the proximity 
- of sequential data accesses, and the like. Typically, a processor maintains a stride 
prediction table (SPT) that records information related to executed accesses to the memory 
mcluding the interval between sequential accesses, herein termed the stride of the accesses' 
The stride prediction table is typically configured to record multiple sets of information 
related to executed accesses to keep track of multiple potential strides. For ease of 
convenience and understanding, the invention is presented herein using the paradigm of a 
single set of information that is used to keep track of a single stride between related 
memory accesses. One of ordinary skill in the art will recognize that the principles 
presented herein are directly applicable to the conventional use of a stride prediction table 
that includes multiple sets of information related to multiple strides. 

At 1 10, the current stride is determined by the difference between the address of the 
pnor/old access and the address of the new requested access, at 120. On the assumption 
that a next requested access will be equally spaced as the prior access, a prefetch is 
executed to fetch the data at an address at the same interval from the current/new address 
at 130. At 140, the new address replaces the old address, in preparation for the next data ' 
access, and the prefetch routine terminates, at 150. By initiating the prefetch for the next- 
hkely data at the time of requested access to the new data, the prefetched data will be 
present in the higher-speed cache when and if the application initiates an access request for 
tins data. If the next-likely data is not the data that is subsequently requested, the cache will 
not contain the next-requested data, and a memory-access will be required. 

The prefetch process of FIG. 1, however, initiates a prefetch on every access to new 
data, without regard to whether there is any basis for the assumption that the next-likely 
requested data will be equally spaced from the prior-requested data. This causes substantial 
memory access traffic, which can serve to substantially reduce the effective memory 
access. FIG. 2 illustrates an example flow diagram of an improved prior art stride 
prediction prefetch process, wherein a prefetch is initiated if and only if two successive 
accesses exhibit the same stride. In this embodiment, at 220, the determined stride for the 
new access request is compared to the prior determined stride. If the new stride equals the 
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old stride, the likelihood that the next stride will also be equal to these two strides is 
sufficiently high to warrant a prefetch, at 130. If the new stride differs from the old stride, 
the new stride replaces the old stride, at 230, in preparation for the next cycle. One of 
ordinary skill in the art will recognize that this two-in-a-row criteria for initiating a prefetch 
may be extended to three-in-a-row, four-in-a-row, and so on, to tradeoff between excessive 
memory-access traffic and the likelihood of having the next-requested data in cache. 

Description of the Invention 

It is an object of this invention to improve the likelihood of prefetching data that 
will subsequently be accessed by an application. It is a further object of this invention to 
provide an efficient prefetch scheme that is well suited for hardware implementation. 

These objects and others are achieved by a prefetching system that includes 
hysteresis in the determination and modification of a stride value that is used for 
prefetching data in a sequential process. Once a stride value is determined, intermittent 
stride inconsistencies are ignored, and the stride value retains its prior value. When the 
stride inconsistencies become frequent, the stride value is modified. When the modified 
stride value becomes repetitive, the system adopts this value as the stride, and subsequent 
stride inconsistencies are again ignored, and the stride value thereafter retains is current 
value until inconsistencies become frequent. 



Brief Description of the Drawings 
FIG. 1 illustrates an example flow diagram of a prior art stride prediction prefetch process. 
FIG. 2 illustrates an example flow diagram of an alternative prior art stride prediction 
prefetch process. 

FIG. 3 illustrates an example flow diagram of a stride prediction prefetch process in 
accordance with this invention. 

FIG. 4 illustrates an example block diagram of a stride prediction prefetch system in 
accordance with this invention. 

Throughout the drawings, the same reference numerals indicate similar or 
corresponding features or functions. 
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Best Modes for Carrying Out the Invention 
This invention is premised on the observation that regular stride patterns in memory 
accesses often include intermittent data accesses that do not conform to the stride pattern. 
■ For example, a nested loop structure may include an inner loop that cycles through a list of 
5 data, and an outer loop that presets one or more variables that are used in each cycle of the 
inner loop, or an outer loop that stores a result from each cycle of the inner loop. In this 
example, the inner loop that is cycling through the list of data will likely exhibit a constant 
stride. At each re-commencement of the inner loop, however, the memory access is to the 
start of the list, whereas the prior access was to the end of the list. The span between the 

1 0 end of the list and the start of the list, however, will not correspond to the inner loop's 

stride. Additionally, a data access by the outer loop at the beginning or end of the loop will 
produce a span between accesses that does not correspond to the inner loop's stride. Other 
examples of an intermittent break in stride include the processing of data that is organized 
in a multi-dimension array. Typically, the data is processed for a given range along one 

15 dimension, then an index to another dimension is incremented, and the data at this next 
indexed other dimension is processed for the given range along the first dimension. The 
stride along the range of the first dimension will generally be constant, but the increment of 
the index to the next dimension will likely result in an access having a span from the prior 
access that does not match the stride along the first dimension. 

20 In a conventional stride prediction process, whenever the stride is 'broken 1 or 

interrupted, the process of determining the stride is repeated to determine a new stride. 
During the time that the new stride is being determined, prefetches do not occur, and the 
application is delayed by memory-accesses at each re-start of an inner loop, or each 
incrementing of a higher-level index during the processing of a multi-dimension array. 

25 In accordance with this invention, the stride value is preserved during intermittent 

breaks in stride. Performing a prefetch of data at the current stride is dependent upon the 
number of equal- value strides within a given number of memory accesses, and adjusting 
the prefetch value is dependent upon the occurrence of multiple non-equal-value strides. In 
a simple example, a prefetch of data may be performed whenever two out of three accesses 

30 have successive equal strides, and a modification of the stride value may be performed 
whenever two accesses have successive unequal strides. 
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FIO. 3 ttastiate, an example flow diagmm of a stride prediction prefetch process in 
accorfance wtth this invention. to «, example> . ^ fa ^ ^ 

number of successive strides of to same va.ue, up to , selected maximum, u . prefetred 
• embodunen, of .his invention, tos coon, parameter is also used ,o distinguish betweeu 
mtermitten. breaks in stride and an ac.ua!, persistent change of s «ride. It wiU be eviden, to 
one of ordntary skill ta to art in view of this disclosure to, an independen. parameter 
could be employed te conn, to number of successive non-equal snides 

After detenninmg to cmreot abide, behveen to new access address and to prior 
access address, a. .20, to process of mis invention compares to current abide with to 

326; tf not to cutren, conn, is decremented, a. 322. In a preferred embodiment, to coun. 
■a tamed «o a value from zero to a maximum count Blocks 324 and 328 eta to 
mcremented or decremented coun. from blocks 322 and 326 to temain withta these tarns 
respectively. * 

At 330, to count is compared to an upper limit UL, and .0 a lower limit, IX 
wherein to ,owcr taut LL is preferably ,ess ton to upper Ami. UL. Indus exampte, to 
upper tart UL responds to to number of e^-stride occurrences required to warrant a 
prefetch ftom to next stiide-pretoted access tocation. If to coun. equals or exceeds mis 
upper ^m.tUL.aprefetoh of date from an adtoss correspondmg ,0 to current address 
plus the cu.xen, abide is executed, a. 130.Forexamp.e,iftonpper li mi. ULisava.ua of 
bvo, to prefetch is no. executed, initially, unless two-in-a-row of to same abide value 
cecum. If to upper limi. is a value of toee, to prefeteh is no. executed until three-in-a- 
row of to same snide value cecum, and so on. Hereafter, subsequent cud-vetoed strides 
continue to increment to couot up to to maximum, via 326, 328. In a preferred 
embodiment to maximum value is selected as to number of successive equal-value 
stndes required to conclude to. to cmrent snide value is 'reliable'. 

Each time to. a nou-equal snide occurs, to coun. is decremented, a. 322. Tens, 
to dtfference betiveea to maximum coun. and to ctnren. coun. corresponds to a measure 
of unreliability' of to curren. stride value. The lower Umi. is se.ec.ed as to value of to 
current coun, to, constito.es a sufficient measure of unreliabili<y .0 warrant a change to 
to curren, abide vatoe, a, 230. GeneraHy, as indicated in FIO. 3, if to current sMde value 
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is deemed unreliable, at 230, a prefetch, at 130, is not performed. In like manner, if the 
current stride value is sufficiently reliable to warrant a prefetch, at 130, a modification of 
the current stride value at 230 is not performed. Thus, the lower limit LL is set to be less 
• than the upper limit UL, and generally set to be UL-1. Optionally, as indicated by the 
5 dashed connection between the test block 330 and the block 140, the lower limit LL may 
be selected relative to the upper limit UL such that the assessed reliability of the current 
stride is not sufficient to warrant a prefetch, at 130 (count<UL), while the assessed 
unreliability is not sufficient to warrant a modification of the current stride, at 230 
(count>LL). 

10 FIG. 4 illustrates an example block diagram of a stride prediction prefetch system 

400 in accordance with this invention. A fetch controller 430 effects prefetching of data 
from a memory 450 to a cache 460, based on the contents of a control register 410 and the 
data access requests from a processor 420. The address of the requested data is used to 
determine whether the application is requesting data in a repetitive manner, exhibiting a 

1 5 consistent stride, or span of locations, between memory accesses, as discussed above. The 
control register 410 includes an address of a prior data access 416, the prior stride 412, and 
a counter 414. As noted above, two counters may be used in lieu of the single counter 414, 
to maintain an independent count of equal-stride and non-equal stride accesses. The 
address of the currently requested data from the processor 420 is compared to prior address 

20 416 to determine a current stride. If the current stride corresponds to the prior stride 412, 
the counter 414 is incremented; otherwise, it is decremented. Depending upon the 
incremented or decremented value of the counter 414, the fetch controller determines 
whether to initiate a prefetch of data from a next-predicted location in the memory 450 to 
the cache 460. When a subsequent data request is for data that has been prefetched into the 

25 cache 460, the cache 460 provides the data directly to the processor, thereby avoiding the 
delay introduced by retrieving the data from the memory 450. 

Also dependent upon the incremented or decremented value of the counter 414, the 
fetch controller 430 determines whether to modify the stride value 412, as detailed above. 
By deterniining whether to modify the stride value 412 based on a count that depends upon 

30 the number of non-equal strides, rather than modifying the stride value based on the 
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occurrence of one non-equal stride as in a conventional system, the stride prediction 
prefetch system 400 is de-sensitized to intermittent breaks in stride. 

The foregoing merely illustrates the principles of the invention. It will thus be 
appreciated that those skilled in the art will be able to devise various arrangements which, 
although not explicitly described or shown herein, embody the principles of the invention 
and are thus within the spirit and scope of the following claims. 
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CLAIMS 

1 . A method of prefetching data from a memory to a cache, comprising: 

determining a first measure of equal-stride memory accesses, based on a plurality 

of equal-stride memory accesses and a prior stride value, 

deternuning a second measure of non-equal-stride memory accesses, based on a 

plurality of non-equal-stride memory accesses and the prior stride value, 

effecting a prefetch of the data from the memory based on the first measure, and 
effecting a modification of the prior stride value based on the second measure. 

2. The method of claim 1, wherein 

determining the first measure and the second measure is effected by maintaining a 
count that is incremented for each equal-stride memory access and decremented for each 
non-equal-stride memory access, and 

effecting the prefetch and effecting the modification are each based on the count. 

3. The method of claim 2, wherein 

effecting the prefetch occurs when the count is equal or above an upper limit, and 
effecting the modification occurs when the count is equal or below a lower limit. 

4. The method of claim 3, wherein 

the count is limited to a maximum of three, 
the upper limit is two, and 
the lower limit is one. 
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5. A prefetch system comprising: 

a control register that is configured to contain at least one measure that corresponds 
to a consistency of stride values between requested memory accesses, 

a prefetch controller that is configured to prefetch data from a memory to a cache, 
based on the measure of consistency, 

wherein 

the consistency of stride values is dependent upon a comparison of a current stride 
with a prior stride value, 

the prefetch controller is further configured to modify the prior stride value based 
on a measure of inconsistency, and 

the measure of inconsistency is based on a plurality of non-equal-strides between 
requested memory accesses. 

6. The prefetch system of claim 5, wherein 

the measure of consistency and the measure of inconsistency correspond to a count 
that is incremented upon each equal-stride requested memory access, up to a maximum 
count, and decremented upon each non-equal-stride requested memory access, down to a 
minimum count 



7. The prefetch system of claim 6, wherein 

the prefetch controller is configured to: 

prefetch the data when the count is equal or above an upper threshold level, 

and 

modify the prior stride value when the count is equal or below a lower 
threshold level. 

8. The prefetch system of claim 7, wherein 

the maximum count is three, 
the upper threshold level is two, 
the lower threshold level is one, and 
the minimum count is zero. 
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9. A processing system, comprising: 

a memory that is configured to provide access to data based on an access address, 

a cache, operably coupled to the memory, that is configured to store data that is 
accessed from the memory, to facilitate rapid access to the data, 

a processor, operably coupled to the memory and the cache, that is configured to 
provide the access address and to receive the data from the cache, if it is stored in the 
cache, or from the memory, if it is not stored in the cache, and 

a fetch controller, operably coupled to the processor, the memory, and the cache, 
that is configured to effect a transfer of data from the memory to the cache, based on the 
access address and a predicted stride value, 

wherein 

the fetch controller is further configured to maintain 

a measure of stride consistency that is based on repeat occurrences of equal 
stride values, and 

a measure of stride inconsistency that is based on repeat occurrences of 
unequal stride values, and 

the fetch controller effects the transfer of data based on the measure of stride 
consistency, and effects a modification of the predicted stride value based on the measure 
of stride inconsistency. 

10. The processing system of claim 9, wherein 

the measure of stride consistency and the measure of stride inconsistency are each 
based on a count that is incremented upon each equal-stride requested memory access, up 
to a maximum count, and decremented upon each non-equal-stride requested memory' 
access, to a minimum count. 
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11. The processing system of claim 10, wherein 
the fetch controller is configured to: 

effect the transfer of data when the count is equal or above an upper 
* threshold level, and 

. effeCt&emo * ficati ^ 
or below a lower threshold level. 



e processing system of claim 1 1, wherein 
the maximum count is three, 
the upper threshold level is two, 
the lower threshold level is one, and 
the minimum count is zero. 
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COUNTER BASED STRIDE 



PREDICTION FOR DATA PREFETCH 
Abstract 



A prefetching system (400) include, hysteresis in to determination and 
modrficrton of a stride vahte (4,2) to, b ^ forprefetching ^ fa , 

(322-330), and to stnde value retains i«s prior van*. When to stride inconsistencies 
become fteonen, (322-330), to sftide vato is modified (230). When to modified stride 
vahre hecome, repetitive, to System adopts thia vahte as to aride, and suhsennen, stride 
mconststenetes are again ignored, and to atride vame thereafter retains is cnrmnt vame 
until inconsistencies become frequent. 
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